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1.0 Authorization and Notification 

This request to conduct a peer review of the International Space Station (ISS) proposal to use 
Bayesian methodology for updating Mean Time Between Failure (MTBF) for ISS Orbital 
Replaceable Units (ORU) was submitted to the NASA Engineering and Safety Center (NESC) 
on September 20, 2005. 

The request was presented and the plan approved by the NESC Review Board (NRB) on October 
6, 2005. This final report with recommendations to the ISS Program was presented to the NRB 
on November 17, 2005. 


NESC Request No. 05-163-E 




NASA Engineering and Safety Center 
Technical Consultation Report 

Document #: 

RP-05-131 

Version: 

1.0 

Title: 

New Method for Updating Mean Time Between Failure 
for ISS Orbital Replaceable Units 

Page #: 

4 of 25 


2.0 Signature Page 


Consultation Team Members 


Vickie S. Parsons, NESC Vitali Volovoi 


James Womack 
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3.0 List of Team Members 


Name 

Title 

Affiliation 

Vickie Parsons 

NESC Systems Engineer 

LaRC 

Vitali Volovoi 

Statistical Consultant 

Georgia Institute of 
Technology, School of 
Aerospace Engineering 

James Womack 

Statistical Consultant 

Aerospace Corporation 


Support 


Cindy Bruno-Miller 

Program Analyst, MTSO 

LaRC 

Elizabeth Holthofer 

Technical Writer 

ViGYAN, Inc., LaRC 
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4.0 Executive Summary 

The ISS Program requested a peer review of their proposal to use operational data to update the 
MTBF for ISS ORUs by applying Bayesian methodology. The results were requested by 
October 20, 2005 in order to be available during the process of reworking the current ISS flight 
manifest. 

After a review of the documentation provided by the ISS Program and discussion with the 
principle contributors to the proposal, the statistical peer review team concluded that applying 
Bayesian methodology is an appropriate approach for updating MTBF estimates. However, 
several assumptions used in this particular application should be refined in order to preclude 
overly optimistic estimates. Specifically, the selection of a, the justification for excluding 
degradation, and the categorization of ORUs need to be re-visited. 
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5.0 Consultation Plan 


This consultation consisted of a peer review by statistical experts of the proposed application of 
Bayesian methodology to revising MTBF estimates for ISS orbital replacement units. Each 
member of the review team analyzed the following documents and files: 

• Fisher - Price Report (Space Station Freedom External Maintenance Task Team Final 
Report), July 1990. 

• Bayesian Methodology and How It Could Apply to ISS-USOS External ORU MTBFs, 
Jean Ni, August 29, 2005. 

• Estimation of Prior Distribution Parameter a Used in Bayesian Methodology -2, Jean Ni, 
September 13, 2005. 

• Estimation of Prior Distribution Parameters Used in Bayesian Methodology, Jean Ni, 

June 23, 2005. 

• Hubble Space Telescope Reliability Assessment, July 2002 Model, Helen Wong, 
November 21, 2002. 

• Various Excel spreadsheets for Bayesian calculations by ISS. 

• Modeling Analysis Data Set (MADS) for ISS ORU manufacturers’ estimated failure rates 


A telephone conference was conducted between the peer review members and representatives of 
the ISS Program responsible for the proposal, where questions and details were pursued. Two 
additional telephone conferences and email exchanges resulted in the final peer review results 
and recommendations that follow. 
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6.0 Description of the Problem, Proposed Solutions, and Risk Assessment 

The ISS Program proposed the application of Bayesian methodology to revise (increase) the 
MTBF estimates for ORUs. The risk inherent in increasing the MTBF estimates would be a 
failure to have necessary replacements if the MTBFs are overly optimistic. During the analyses 
of the proposal, the statistical peer review team identified several areas of concern, explained in 
the following sections. 
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7.0 Data Analysis 

Selection of the prior distribution parameter a is the most important and difficult aspect of 
applying the Bayesian methodology to updating reliability estimates. The values of a under 
consideration for use in ISS ORUs are between 0.1963 and 1.5582, which are smaller then 
typical values used in other space programs. 1 These values were derived from various random 
failure rate estimates (the shape parameter) provided in the Fisher-Price report 2 and 
manufacturer data on individual components (the scale parameter) contained in the MADS 
dataset. 3 Small values for a correspond to wide confidence bounds for prior estimated failure 
rates and lead to discounting the prior in formation. Values smaller than 1.368 are outside of a 
meaningful range as described in the detail calculation section. 

Determining the best value of a would require determining the uncertainty in the original failure 
rate of each individual ORU. All of the assumptions and uncertainties used to estimate each part 
failure rate in the ORU would needed to be modeled by a statistical distribution from which it 
would be possible to determine the distribution of the ORU failure rate. This however is a very 
difficult and time-consuming process. 

A simpler approach for selecting an a value was used by the Hubble and Tracking and Data 
Relay Satellite (TDRS) Programs. They used an a value of 2.2068, which was determined by 
setting the probability that X is less than Xq/5 equal to 5 percent. 4 This models the assumption 
that there is only a small probability that the original failure rates are larger then five times the 
true failure rate. Use of 5 percent is a typical rule of thumb for statistical significance. The 
recommended 1.56 by the ISS Program yields a corresponding probability of 10 percent. The 
2.2068 value of a has undergone extensive review by the Hubble Program. The prior distribution 
for a = 2.2068 (see Figure 1 in Appendix B) has a much more reasonable shape for modeling the 
uncertainty in the original failure rate. Also, it has proven to provide good reliability updates for 
the TDRS Program, in line with updates calculated by other acceptable reliability estimating 
methods. 5 

Exponential distributions dominated the reliability world for decades due to the simplicity of the 
systems analysis with exponentially distributed failures and the need for only one parameter, 


1 Ni, Jean. “Estimation of Prior Distribution Parameter “alpha” Used in Bayesian Methodology-2.” PowerPoint 
Presentation, Johnson Space Center, 13 September 2005. 

2 Space Station Freedom External Maintenance Task Team, Final Report. NASA, July 1990. 

3 Modeling Analysis Data Set (MADS). 

4 

Wong, Helen. “Hubble Space Telescope Reliability Assessment, July 2002 Model.” The Aerospace Corporation, 
Report No. TOR - 2003 (2154) - 2352, 21 November 2002. 

5 Womack, James. “Tracking and Data Relay Satellite Reliability Models.” The Aerospace Corporation, Report No. 
TOR-2005 (2141) - 3876, 26 January 2005. 
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failure rate or its inverse, mean time between failures (MTBF), as opposed to at least two 
required to define other distributions. The latter consideration was (and still is) of great 
importance when there is a shortage of the data needed to characterize the failure pattern 
statistically. The use of an exponential distribution to describe “random” failures can be 
complemented by deterministic life limits for units that are known to degrade with time. 
However, for some failure modes (especially for mechanical components), the degradation is 
gradual, and so is the corresponding increase in failure rate. This continuous increase in the 
failure rate is not limited to the end of life, and is usually described, depending on the type of 
degradation, either by a Weibull distribution with the shape parameter p> 1 or by a LogNormal 
distribution. The ISS proposal assumes an exponential distribution of the failure frequencies, 
thus ignoring the degradation of individual units. The MADS dataset indicates that 
manufacturers’ estimates of Weibull shape parameter P range from 2.5 to 5.0, which raises the 
concern than degradation should not be ignored. The calculations for inclusion of failures 
obeying Weibull distributions into the Bayesian updating procedure can be found in Womack’s 
report. 6 

Several assumptions within the Fisher-Price report were accepted by the ISS Program without 
explanation. 7 One concern for this peer review team was the logic associated with grouping 
large sets of diverse components into four categories. The Fisher-Price method for aggregation 
of failure estimates for the analog study was not explained in the portion of the Fisher-Price 
provided to this peer review team. 8 Then, the final Bayesian methodology is planned to be 
applied to all ORUs unifonnly. 

This ISS proposal, as currently presented or with the recommendations cited in this report, can 
lead to counterproductive results by removing conservatism from the estimation of the failure 
rates, unless this is supplemented with the rigorous risk analysis for over and under predicting 
the amount of spares required. This is due to the fact that under-predicting the amount of spares 
has greater consequences that over-predicting, and an initial conservatism in estimating MTBF 
compensated for this inequality. 


6 Ibid. 

7 Space Station Freedom External Maintenance Task Team, Final Report. NASA, July 1990. 

8 Ibid. 
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8.0 Findings, Observations, and Recommendations 

F-l Small values for a correspond to wide confidence bounds for prior estimated failure rates 
and lead to discounting random failure rate estimates (the shape parameter - Fisher-Price report 9 ) 
and manufacturer data on individual components (the scale parameter) contained in the MADS 
dataset. Values smaller than 1.368 for a are outside of a meaningful range. 

F-2 The MADS indicates that manufacturers’ estimate of p range from 2.5 t 5.0, which raises 
the concern that degradation should not be ignored. 

F-3 The use of an exponential distribution complemented with establishing deterministic life 
limits for units known to degrade with time can lead to overly optimistic prediction, since units 
can degrade gradually, resulting in continuously increasing failure rate. 

F-4 The risk inherent in increasing the MTBF estimates would be a failure to have necessary 
replacement if the MTBFs are less conservative. 

R-l. Within the Bayesian methodology for MTBF estimates, use a value of a = 2.2068 until 
additional analysis can show cause for another value. (F-l) 

R-2. Validate the (1 values within the MADS. (F-2) 

R-3. If P values within the MADS are indeed greater than 2.0, adjust the Bayesian methodology 
to include consideration of those degradation values, even though the unit life cycle is 
considerably greater than the expected life cycle of the ISS. (F-2) 

R-4. Investigate some of the more critical components individually for MTBF. (F-3) 

R-5. Rather than rely solely on MTBF estimates, maintain cognizance of the associated 
risk analyses and schedule replacements accordingly. (F-4) 


9 Ibid. 
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9.0 Lessons Learned 


None. 
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10.0 Definition of Terms 


Corrective Actions 


Changes to design processes, work instructions, workmanship practices, 
training, inspections, tests, procedures, specifications, drawings, tools, 
equipment, facilities, resources, or material that result in preventing, 
minimizing, or limiting the potential for recurrence of a problem. 


Finding 


A conclusion based on facts established during the assessment/inspection 
by the investigating authority. 


Lessons Learned 


Knowledge or understanding gained by experience. The experience may 
be positive, as in a successful test or mission, or negative, as in a mishap 
or failure. A lesson must be significant in that it has real or assumed 
impact on operations; valid in that it is factually and technically correct; 
and applicable in that it identifies a specific design, process, or decision 
that reduces or limits the potential for failures and mishaps, or reinforces a 
positive result. 


Observation 


A factor, event, or circumstance identified during the 
assessment/inspection that did not contribute to the problem, but if left 
uncorrected has the potential to cause a mishap, injury, or increase the 
severity should a mishap occur. 


Problem 


The subject of the independent technical assessment/inspection. 


Recommendation An action identified by the assessment/inspection team to correct a root 

cause or deficiency identified during the investigation. The 
recommendations may be used by the responsible C/P/P/O in the 
preparation of a corrective action plan. 


Root Cause Along a chain of events leading to a mishap or close call, the first causal 

action or failure to act that could have been controlled systemically either 
by policy/practice/procedure or individual adherence to 
policy/practice/procedure. 
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11.0 List of Acronyms 


HST 

Hubble Space Telescope 

ISS 

International Space Station 

MADS 

Modeling Analysis Data Set 

MTBF 

Mean Time Between Failure 

NASA 

National Aeronautic and Space Administration 

NESC 

NASA Engineering and Safety Center 

NRB 

NESC Review Board 

ORU 

Orbital Replaceable Units 

TDRS 

Tracking and Data Relay Satellite 
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13.0 Minority Report 

There are no dissenting opinions in this report. 
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VOLUME II: APPENDICES 

Appendix A. NESC Request Form (PR-003-FM-01) 
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NASA Engineering and Safety Center 
Request Form 


Submit this IT A/I Request, with associated artifacts attached, to: n rbexeCSecfa .nasa.20V. or to 
NRB Executive Secretary. M/S 105. NASA Langley Research Center. Hampton. YA 23681 


Section 1: NESC Review Board (NRB) Executive Secretary Record of Receipt 


Received (mmdd/yyyy h:mm ant pm) 
9/20/2005 12:00 AM ' 


Status: New 


Reference #: 05-163-E 


Initiator Name: Neil Lemmons 


E-mail: neil.lemmons- 
l@nasa.gov 


Center: JSC 


Phone: (281 >-244-8080. Ext. 


Mail 


Stojv 


OM 


Short Title: New Method for Updating Mean Time Betw een Failure 


Description: Jay Leggett got a call from Neil Lemmons/ JSC) in the ISS program and he is requesting a peer 
review of an assessment that he performed (or someone in his group) on using ISS Orbital Replaceable Unit 
(ORU) operational data to update Mean Time Between Failure (MTBF) by applying Bayesian methodology. 
Updating the MTBF for ORUs is important in setting payload requirements for future missions (ATV, HTV, 
SSP). The current flight manifest is being reworked and this analysis could play a role in manifest needs. Neil 
mentioned that the a manifest decision is targeted for the end of October. 

Neil Lemmons would be the initiator. 

Is there anything else that you need on this? This looks like something that would fall in Vickie’s court. 
Thanks. 

Jay 


Source (e.g. email, phone call, posted on web): phone 

Type of Request: assessment 

Proposed Need Date: 10/20/2005 

Date forwarded to Systems Engineering Office (SEO): (non dd/vwv h:mm am pm): 

Section 2: Systems Engineering Office Screening 

Section 2.1 Potential IT A/I Identification 

Received by SEO: (mm dd/ww h:mm am pm): 9/20 2005 12:00 AM 

Potential ITA/I candidate? IXlYes 1 1 No 

Assigned Initial Evaluator (IE): Vickie Parsons 

Date assigned (mm/dd/yyyy): 9/26/2005 

Due date for ITA/I Screening (mm/dd/yyyy): 10/6/2005 

Section 2.2 Non-IT A/I Action 

Requires additional NESC action (non-IT.VIY? 1 Yes 1 1 No 

If yes: 

Description of action: 

Actionee: 

Is follow-up required? 1 lYes I I No If yes: Due Date: 

Follow-up status/date: 

If no: 

NESC Director Concurrence (signature): 

Request closure date: 


NESC Request Form Page 1 of 3 
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Section 3: Initial Evaluation 

Received by IE: (mm/dd/yyyy h:mm am/pm): 

Screening complete date; 

Valid ITA/I candidate? DYes □ No 

Initial Evaluation Report #: NESC-PN- 

Target NRB Review Date: 

Section 4: XRB Review and Disposition of NCE Response Report 

ITA/I Approved: I |Yes [ ] No Date Approved: | Priorilv: - Select - 

ITA/I Lead: . Phone ( ) - , x 

Section 5: IT.4/I Lead Planning, Conduct, and Reporting 

Plan Development Start Date: 

ITA/I Plan 4 NESC-PL- 

Plan Approval Date: 

ITA/I Start Date I Planned: | Actual: 

ITA/I Completed Date: 

ITA/I Final Report #: NESC-PN- 

ITA/I Briefing Package #: NESC-PN- 

Follow-up Required? 1 lYes 1 1 No 

Section 6: Follow-up 

Date Findings Briefed to Customer: 

Follow-up Accepted: 1 lYes | | No 

Follow-up Completed Date: 

Follow-up Report U: NESC-RP- 

Section 7: Disposition and Notification 

Notification type; - Select - | Details; 

Date of Notification: 

Final Disposition: - Select - 

Rationale for Disposition: 

Close Out Review Date: 
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Form Approval and Document Revision History 


Approved: 


NESC Direelor 

Date 


Version 

Description of Revision 

Office of Primary 
Responsibility 

Effective 

Date 

1.0 

Initial Release 

Principal Engineers 
Office 

29 Jan 04 
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Appendix B. Details of Calculations and Considerations by Peer 
Review Team 

The Bayesian methodology is used to incorporate on-orbit experience into preexisting ORU 
reliability models. It is assumed here that the failure distribution of an ORU has an exponential 
distribution, that is, the ORU has a constant failure rate 10 . The ORU failure is denoted by k, and 
can be equivalently represented by its inverse, a mean time between failures (MTBF). The 
failure rate is usually estimated using piece part reliability models where the failure rates of the 
individual parts are obtained from reliability handbooks (e.g. MIL-HDBK-217) and incorporate 
part quality, operating environment and temperature, and duty cycle. The Bayesian methodology 
models A, as a random variable with a gamma distribution called the prior distribution. The 
gamma density function is 


P c 


m= T( x 
r(a) 

> 0 


X aA e- pi 


The mean of the gamma distribution is a/p and is set equal to the initial ORU failure rate, say A 0 . 
By setting P = a/Ao the prior distribution is characterized by the single parameter a. The on-orbit 
lifetime data of the ORU is incorporated into the model by computing the conditional 
distribution of k given the observed lifetime data. This is called the posterior distribution and 
turns out to be a gamma distribution with parameters: 

a P ost = cc + s and P post =p + t. 

Where s = observed ORU failures, t = total operating time of the ORU. The updated failure rate 
is usually estimated by the mean of the posterior distribution given by 


a post _ Ct + s 

Ppost P + 1 

The prior density is intended to model our uncertainty in our original failure rate Ao of the 
component. Figure 1 is a plot of the prior density function for several values of a each with a 
mean equal to ko. 


10 Other Bayesian methods are available for non-constant failure rate distributions. 
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Figure 1. Prior Density Functions 



From Figure 1 we see the affect of a on the prior density. The smaller a is the more we bias the 
model toward smaller failure rates. The a values for the ISS Program were selected by fitting the 
prior distribution to failure rates given in the Fisher-Price report. 11 Table 6. 1 . Table 6. 1 in Jean 
Ni;s Bayesian presentation contains a mean, 5-percentile, median, and 95-percentile of failures 
rates for groups of ORUs under a number of assumptions. “ Because these failure rate 
distributions contain the variability of failure rates for groups of different ORUs the resulting 
distribution has a large variance as compared to the variance of a single ORU. Fitting an a to 
these distributions with large variances will result in values of a that are too small. 

13 

Considering the BCDU that experienced no failures during a total time of 236448 hours. 

Using a=0.6065 for electronics components, there is a 95 percent confidence that MTBF will be 
at least 137076 hours. 14 In contrast, using classical statistics based on the operational data only, 
the 95 percent confidence interval based on j 2 statistics yields 78928.3. The Bayesian 
procedure is designed to adjust an initial estimate based on the operational data and implies that 
the updated result will be somewhere between prior and operational prediction, which is clearly 
violated here. For comparison, the lower limit of a=1.368 for a “reasonable value” 


Space Station Freedom External Maintenance Task Team, Final Report. NASA, July 1990. 

12 Ni, Jean. Bayesian - Final PowerPoint Presentation - August 29, 2005, ISS Program Office. 

13 

Womack, James. “Tracking and Data Relay Satellite Reliability Models.” The Aerospace Corporation, Report 
No. TOR-2005 (2141) - 3876, 26 January 2005. (Slide 11) 

14 “Bayesian Methodology and How It Could Apply to ISS-USSOS External ORU MTBFs.” Saber, August 2005. 
(Slide 8) 
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recommended by HST program provides 95 percent confidence lower bound for MTBF as 60775 
hours, while the value a=2.2068 selected by HST yields in this situation 37674.7 hours. 

The main concern with the grouping of distinct units into broad categories is that the bounds in 
the Fisher-Price report reflect variability among different ORUs and not an uncertainty about the 
failure rate for a given ORU. 15 


category 

k (50th) 

k (95th) 

k (95th)/A. (50th) 

electrical 

3.33333E-07 

7.41 174E-06 

22.23523395 

electronic 

4.46828E-06 

3.92866E-05 

8.792331264 

eletromech 

4.87876E-06 

1.96078E-05 

4.019019608 

mechanical 

4.02253E-07 

7.36377E-06 

18.30633284 


Table 1. MTBF Variability Due to Grouping ORU into Four Categories Based on 
MADS 

The distinction between the two is important since the former is an artifact of the grouping into a 
category, rather than a property of an individual ORU. If the contribution to the resulting bounds 
from the variability between different ORUs within a category is significant, the result is an 
excessively wide confidence interval, a smaller value of a and unjustified discounting of the 
importance of the prior. This effect can potentially explain unusually small values of a that were 
obtained from the “synthesis data.” 


15 Space Station Freedom External Maintenance Task Team, Final Report. NASA, July 1990. 
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A(5th) 

A(50th) 

A(95th) 

A(95th)/A(50th) 

a 

syntheses 

electrical 

1.45E-08 

4.08E-07 

1.15E-05 

2.82E+01 

0.2481 

electronic 

2.06E-07 

1.39E-06 

9.40E-06 

6.76E+00 

0.6065 

electro_mech 

2.31 E-08 

1.19E-06 

6.17E-05 

5.18E+01 

0.1972 

mechanical 

2.22E-08 

1.16E-06 

6.09E-05 

5.25E+01 

0.1963 

work package 

electrical 

5.10E-07 

1.84E-06 

6.62E-06 

3.60E+00 

1.3024 

electronic 

4.70E-07 

3.19E-06 

2.16E-05 

6.77E+00 

0.6056 

electro_mech 

7.93E-08 

2.01 E-06 

5.07E-05 

2.52E+01 

0.2606 

mechanical 

2.75E-07 

1.77E-06 

1.14E-05 

6.44E+00 

0.6352 

combination 

combination-analog study 

1.67E-06 

5.35E-06 

1.73E-05 

3.23E+00 

1.5582 

combination-synthesis 

2.09E-07 

8.42E-07 

3.41 E-05 

4.05E+01 

0.2150 

combination-work package 

2.45E-07 

2.01 E-06 

1.65E-05 

8.21 E+00 

0.5115 


Table 2. Deriving a Based on Fisher & Price Report Data Taken from Slide 5 Estimation 
of Prior Distribution Parameter a Used in Bayesian Methodology-2 September 13"' 
2005, AI# 103 update 16 

The Table 1 shows variability within a category based purely on the MTBF provided in MADS. 
This should be compared with the Table 2 that is reproduced the results presented by ISS. It can 
be observed that for the electrical and electronic components the numbers from Table 1 are 
reasonably close to the numbers from the “syntheses” set of data in the Table 2, showing that at 
least for these two categories the ratio between the 95 percent and median values for failure rate 
can be attributed to the grouping effects. 


16 Ibid. 
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